AITopics | lemma 6

Collaborating Authors

lemma 6

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

Balasubramanian, Krishnakumar

arXiv.org Machine LearningMay-26-2026

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-β)/(2(d+4-β))}$, where $0\le β<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in Deng et al., 2026 (arxiv:2602.04770). For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $η$.

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

2605.22795

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

On the Sample Complexity of Robust Binary Hypothesis Testing

Vallinayagam, Shankar, Pensia, Ankit, Jog, Varun

arXiv.org Machine LearningMay-26-2026

We study the sample complexity of robust binary hypothesis testing under three standard contamination models: $\varepsilon$-additive (Huber), $\varepsilon$-subtractive, and $\varepsilon$-total variation (TV), denoted by $n^*_{\mathrm{Hub}}(\varepsilon)$, $n^*_{\mathrm{Sub}}(\varepsilon)$, and $n^*_{\mathrm{TV}}(\varepsilon)$, respectively. For subtractive contamination, we show that least favourable distributions exist and provide explicit formulas for the same, bringing this model in line with the classical Huber and TV models. Next we show that in all three models, sample complexity may be highly unstable in the contamination parameter $\varepsilon$, increasing by polynomial factors even for $o(\varepsilon)$ perturbations. Similarly, there may be polynomial factor gaps between the sample complexities when $\varepsilon$ is known exactly versus when it is known up to $o(\varepsilon)$ error. Despite the instability of the sample complexity in all models, we show that the sample complexities across models are comparable up to constant-factor rescaling of $\varepsilon$. Specifically, for any fixed $δ_0>0$, the following hold for all distributions $p$ and $q$: (i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$, (ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+δ_0)\varepsilon)$, and (iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+δ_0)\varepsilon)$, and the scaling constants are tight. Finally, we extend our results to adaptive versions of the contamination models.

artificial intelligence, contamination, scientific discovery, (16 more...)

arXiv.org Machine Learning

2605.24741

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.62)

Add feedback

On efficient robust regression with subquadratic samples

Adil, Deeksha, Błasiok, Jarosław, Chen, Hongjie, Sridharan, Deepak Narayanan

arXiv.org Machine LearningMay-19-2026

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Machine Learning

2605.18042

Country: North America > United States (0.27)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Sliced Inner Product Gromov-Wasserstein Distances

Gong, Xiaoyun, Rioux, Gabriel, Goldfeld, Ziv

arXiv.org Machine LearningMay-12-2026

The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.08546

Country: North America > United States (0.92)

Genre: Research Report (0.63)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

Tang, Dier, Han, Guangyue

arXiv.org Machine LearningMay-12-2026

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

artificial intelligence, machine learning, symmetry, (13 more...)

arXiv.org Machine Learning

2605.09396

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Experiments for Partial Causal Effect Identification

Maringgele, Tobias, Etesami, Jalal

arXiv.org Machine LearningMay-11-2026

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem is NP-hard via a reduction from 0-1 knapsack. Building on the polynomial-programming framework of Duarte et al. (2023), we give a general procedure for evaluating epistemic potency in discrete settings. To control the super-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm. On Erdos-Renyi random graphs and 11 bnlearn benchmark networks, the two criteria together prune 50-88% of candidate experiments on average without solving a single polynomial program. For the general subset search, we show that ID-pruned experiments are combinatorially inert, yielding a super-exponential reduction in the number of subsets evaluated. We close with an end-to-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

2605.06993

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

The Causal Description Gap: Information-Theoretic Separations Across Pearl's Hierarchy

Emadi, Seyed Morteza

arXiv.org Machine LearningMay-5-2026

Pearl's causal hierarchy shows that observational, interventional, and counterfactual queries are qualitatively distinct. We ask a quantitative version of this question: how many additional bits are needed to specify higher-rung causal answers once lower-rung answers are known? We formalize this via query-class description length, the Kolmogorov complexity of the answer oracle induced by an SCM for a class of queries. Our main construction gives binary acyclic SCMs whose observational distribution has constant description length, while the single-variable interventional answer oracle has description length $Θ(n^2)$. A degree-sensitive upper bound shows that finite-gate-schema SCMs of indegree $d$ have observational-interventional gap at most $O(nd \log(en/d) + n \log n)$, making the quadratic construction order-optimal in the dense regime and a rooted-tree construction order-optimal for bounded indegree. The quadratic separation persists under $\varepsilon$-accurate total-variation descriptions for every fixed $\varepsilon < 1/4$. At the next rung, the full hard-do interventional oracle can still leave a $Θ(n)$ counterfactual description gap. A general ambiguity-to-bits theorem and Shannon analogue show that these gaps equal the logarithm of residual higher-rung ambiguity up to lower-order terms.

artificial intelligence, int1, machine learning, (17 more...)

arXiv.org Machine Learning

2605.02177

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.35)

Add feedback

The Limits of Learning with Missing Data

Brian Bullins, Elad Hazan, Tomer Koren

Neural Information Processing SystemsMay-1-2026, 05:37:51 GMT

We study linear regression and classification in a setting where the learning algorithm is allowed to access only a limited number of attributes per example, known as the limited attribute observation model. In this well-studied model, we provide the first lower bounds giving a limit on the precision attainable by any algorithm for several variants of regression, notably linear regression with the absolute loss and the squared loss, as well as for classification with the hinge loss. We complement these lower bounds with a general purpose algorithm that gives an upper bound on the achievable precision limit in the setting of learning with missing data.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback